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ABSTRACT 

In The Clash of Civilizations, Samuel Huntington argued that the 
primary axis of global conflict was no longer ideological or eco- 
nomic but cultural and religious, and that this division would char- 
acterize the "battle lines of the future." In contrast to the "top down" 
approach in previous research focused on the relations among na- 
tion states, we focused on the flows of interpersonal communica- 
tion as a bottom-up view of international alignments. To that end, 
we mapped the locations of the world's countries in global email 
networks to see if we could detect cultural fault lines. Using IP- 
geolocation on a worldwide anonymized dataset obtained from a 
large Internet company, we constructed a global email network. In 
computing email flows we employ a novel rescaling procedure to 
account for differences due to uneven adoption of a particular Inter- 
net service across the world. Our analysis shows that email flows 
are consistent with Huntington's thesis. In addition to location in 
Huntington's "civilizations," our results also attest to the impor- 
tance of both cultural and economic factors in the patterning of 
inter-country communication ties. 

Categories and Subject Descriptors 

H. 3.5 [Information Storage and Retrieval]: Online Information 
Services; J. 4 [Social and Behavioral Sciences]: Sociology 

Keywords 

social networks, email, international networks 

I. INTRODUCTION 

Are the world's countries re-aligning as the buildup to a global 
culture war? Most research has examined international global align- 
ments from the top down, based on the relations among nation 

*Part of this work was done while the first author was visiting Ya- 
hoo! Research Barcelona under the Yahoo! internship program. 



states. Rather than examining the relations among states, we take 
a bottom-up view by examining the flows of email between coun- 
tries, to map global patterns of cross-national integration and divi- 
sion based on the structure of interpersonal social ties among the 
populations of the world's countries. 

Our study extends this line of research on spatial and geographic 
patterns by examining economic, demographic, and cultural corre- 
lates of international communication densities. We estimate these 
densities using an anonymized collection of email exchanges num- 
bering in order- 10 7 of users. 

To account for the uneven distribution of the email service's mar- 
ket share, we develop a novel procedure for rescaling the communi- 
cation densities. To do so we regress the realized between-country 
density on the number of users in the sample. Using this regres- 
sion we then predict the most likely value of the tie count between 
the full populations of two countries, rather than just between their 
email users. 

Using this network of cross-country affinities, we investigate the 
covariance of a battery of cultural measures with inter-country flows 
of interpersonal email communication. Following Huntington, we 
code countries based on location in one of the "civilizations" that he 
demarcates, using data derived by Russett, Oneal and Cox |41||37| . 
The cultural variables we consider include Hofstede's [25 ] Power- 
Distance (PDI), Individualism (IDV), Masculinity (MAS) and Un- 
certainty Avoidance (UAI), and Bjornskov's generalized trust index 
(9). We also examine the role of economic and political factors, in- 
cluding the GDP and membership in the European Economic Area. 
Lastly, we included demographic measures of population size and 
distance. 

Our analysis reveals the existence of a large, positive statistically- 
significant effect of common civilizational membership on between- 
country communication density. This result provides evidence to- 
wards a division of the world into civilizational blocks following 
Huntington's theory. We find that Huntington's partitioning of coun- 
tries has about the same level of agreement with the results of 
community detection algorithms as such algorithms have with one 
another. We also uncover effects due to economic inequality, as 
posited by World Systems Theory, as well as a robust effect due to 
Hofstede's Uncertaint Avoidance measure. 
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2. RELATED WORK 

Geo-social datasets. The emergence and growth of internet plat- 
forms has enabled researchers to study very large networks recorded 
through Internet communication(29) . The scaling properties of large 



networks - social and otherwise - emerged from analyses made 
possible by web data] 6 . 50 1 . Methodological advances have opened 
up new research questions related to community detection in large 
social networks )15|[3) or to the distribution of shortest paths fi). 

New technologies have also facilitated the collection and study 
of large amounts of geographic data, enabling previously-unthinkable 
studies of the spatial properties of social interaction. Mobile phone 
geo-location has been used to infer friendship structure |19| , and 
even to measure country-level social network 1 1 8| 1 1 1 1 . Location- 
based websites such as Foursquare or Gowalla have also proven 
to be useful data sources for the study of geo-social interaction. 
Gowalla data has been used to study the relation between social 
networks and mobility, uncovering a high level of social determi- 
nation of the local mobility patterns of individuals 1 42 ]. Geo-social 
data, web- or cellphone-based has already proven its usefulness in 
applications as diverse as epidemiology |42, 48 1, public transporta- 
tion 1 34 1, estimation of migration rates [52) or event recommenda- 
tion |44| . 

A bottom-up approach to measuring trans-national social net- 
works has become feasible only recently. Though more accurate 
methods for the collection of geographic information have been de- 
veloped recently, data coverage has hindered their use in a global 
setting, as cellphone networks and location-based web services typ- 
ically cover only one or a few countries. To our knowledge, the 
earliest study that tackled the issue of global online transnational 
patterns was conducted by Leskovec and Horvitz (2008), whose 
measurements provided evidence of large communication flows be- 
tween countries with colonial pasts (e.g. Portugal and Brazil), coun- 
tries that are close geographically (e.g. France and Belgium), or 
countries connected by histories of large migrations (e.g. Germany 
and Turkey) [32|. In a study of the CouchSurfing international 
hospitality network Lauterbach et al. (2009) described the transna- 
tional web of hospitality exchange ties connecting the members of 
the organization [31 ]. Both studies addressed the issue of transna- 
tional networks only tangentially, however, relying only on self- 
reported location information. Our work addresses this gap in the 
literature by attempting the first study focused primarily on the in- 
ternational structure of worldwide social networks. 

The problem of studying inter-national networks is compounded 
by the comparative scarcity of between-country ties, relative to ties 
formed within the same country, as a number of recent studies have 
noted. Scelatto et al. (201 1) note that most social ties are separated 
geographically by small distances [43]. This finding is reproduced 
by Takhteyev et al. (2012) in a recent study of Twitter networks 
that shows that most connections occur within the same country 
|47| . The study of a large, global network is therefore necessary to 
obtain a clear enough view of the global transnational communica- 
tion networks. 

Cross-country Affinity. For most of the postwar period, re- 
search on international alignments was informed by World Systems 
Theory, an approach that emphasized the influence of economic 
and political factors. Broadly speaking, World Systems Theory 
posits the existence of a hierarchical structure in international re- 
lations, in which a number of core countries engage in the exploita- 
tion of peripheral states (often corresponding to former colonial 
empires). Simply put, international alignments are believed to be 
structured by relations of global economic inequality |14| . 

In the early 1990's, Samuel Huntington called this economic 
model into question. In The Clash of Civilizations, Samuel Hunt- 
ington argued that: 

"the fundamental source of conflict in this new world 
will not be primarily ideological or primarily economic. 
The great divisions among humankind and the domi- 



nating source of conflict will be cultural. Nation states 
will remain the most powerful actors in world affairs, 
but the principal conflicts of global politics will occur 
between nations and groups of different civilizations. 
The clash of civilizations will dominate global politics. 
The fault lines between civilizations will be the battle 
lines of the future." |28| 

Other scholars have also pointed to cultural correlates of eco- 
nomically structured international alignments. Banfield (1958) ar- 
gued that Southern and Northern Italy fundamentally differ in cul- 
tural norms that account for striking differences in economic de- 
velopment |5|. This idea received further elaboration in Putnam, 
Leonardi and Nannetti's (1994) work on Italian regionalization, 
which introduced the idea of differences in the structure of social 
networks between different societies [39]. More recently, the con- 
cept of generalized trust - the extent to which individuals can trust 
others - has gained credence as a fundamental characteristic of so- 
cial interaction in different societies, high levels being associated 
with economic development and efficient institutions |23 10 35]. 

Hofstede's seminal work in the 1980's also identified differences 
in cultural values on a global scale. Hofstede [25] designed a sur- 
vey administered to IBM employees from 55 nations, probing a 
wide range of cultural values related to authority relations, the re- 
lationship between individual and society, gender roles, and so- 
cial and environmental uncertainties. From the IBM study Hof- 
stede derived a number of cultural dimensions, including the four 
measures used in our study for which data are widely-available: 
the power-distance index (PDI), individualism-collectivism (IDV), 
masculinity-femininity (MAS), and the uncertainty avoidance in- 
dex (UAI). Power-distance (PDI) measures the extent to which in- 
dividuals accept unequal power distribution in their social relation- 
ships. Individualist (IDV) societies are defined as societies where 
ties among individuals are loose and members perceive themselves 
as independent self-reliant entities endowed with freedom and re- 
sponsibility. Masculinity (MAS) measures a society's level of dis- 
tinction in gender roles, where men are expected to be "assertive, 
tough, and focused on material success" while women are expected 
to be "modest, tender, and concerned with quality of life" [26]. 
Hofstede characterizes a country as "masculine" when these gen- 
der roles are distinct and "feminine" when they overlap. Uncer- 
tainty avoidance (UAI) refers to the extent to which a society is 
intolerant and threatened by uncertain situations. 

3. DATA AND METHODS 

Our work is based on the aggregate analysis of a communica- 
tion graph composed of a sample of order- 10 7 anonymized users 
of Yahoo! email, observed over a period of 6 months in 2012. An 
edge was considered to exist between a pair of users whenever the 
two users exchanged at least one email message in each direction, 
during the observation window. Only users with a minimum de- 
gree of one were included in our analysis, and our study included 
only those users who were not identified as spammers, and who 
had given consent for their email data to be studied. In order to cre- 
ate the communication graph, our study processed only the email 
header fields indicating the sender and recipient's email address. 

3.1 Inferring Location 

We identified a user's country from two independent sources: 
the user's self-identified country, as recorded in the user profile 
database and the IP geo-location. Users often expedite entry of 
their location by selecting one of the first countries on the drop 
down menu (e.g. Afghanistan), and users do not always update their 



profile when moving to a different country. We therefore combined 
self-reported address with information derived from IP geolocation. 
Our analysis used a similar protocol to that implemented by State, 
Weber and Zagheni in their study of international migrations |45| . 
We used the MaxMind GeoCityLite databas^Jto extract coarse- 
grained, city-level geographic information associated with each IP 
address from which a user logged in during an observation window 
of about one year. We divided the data into spells in a similar fash- 
ion to the protocol used in |45|[^] We considered the geolocated 
country of residence to be the modal country from which the user 
was observed to log in, as per our protocol. Our analysis was fur- 
ther restricted to those users for whom both self-reported and geo- 
located country of residence coincided. We additionally imposed 
minimum thresholds on the number of valid users in a country that 
could be included in the study, discarding countries having too few 
users in our dataset. 

Next, we collapsed the data to a matrix of c x c countries, each 
cell indicating the observed tie density between two countries cor- 
responding to the number of observed reciprocal email ties between 
individuals in the two countries divided by the total possible num- 
ber of ties, given the total number of individuals observed in each 
country. 

3.2 Rescaling Procedure 

Finally, we developed a rescaling procedure to address signifi- 
cant potential biases resulting from differences in market penetra- 
tion and Internet use. Our aim is to estimate tie densities between 
two countries, but we wish to factor out the effects of uneven data 
coverage. For instance, our counts could be off by several orders 
of magnitude between two countries where (due to low Internet use 
or low market share) we observe a fraction of a percent of the pop- 
ulation, as compared with countries where our observations con- 
cern ties between 20% of each of the two populations. To properly 
rescale the communication densities we wish to produce an accu- 
rate estimate of the total number of social ties between countries 

1 and j (Tj f) and ties within the same country, when z = /, Tjj is 
bounded between and 7}™ x = NjNj if ( ^ j, and between and 
7j max = Ni(Ni - l)/2 if ( = j. Let t Lj = 7^/7}™* the proportion 
of ties observed between countries i and j. Furthermore, let c; be 
the fraction of country 1 s adult population Pj that is currently rep- 
resented in the data. Thus N, = c,Pi and Nj = cjPj. 

Assume we move from observing fraction c,- of a country's pop- 
ulation to collecting data about all individuals in the country. Nj 
would then increase by a factor of 1/c; to equal Pi. The new maxi- 
mum count will be updated accordingly. If i ^ j, then: 

T inax' r „ \ — 1 Tinax 
T i.j = ( c i c j) T iJ 

For sufficiently large Nj's the same relation holds as an approx- 
imation for the case when i = j. At first blush, the normalization 
procedure would attempt to preserve the density t,j constant, and 
thus use the formula T!, = (c/Cj) - Tfj to rescale the observed tie- 
count by the same factor by which the possible tie-count increases. 
This approach would be misguided, however. As social graphs 
grow by orders of magnitude, density does not stay constant. The 

nhttp : / /dev . maxmind . com/geoip/geolite 

2 As our analysis was concerned only with the modal country in 
which a user was observed, we used slightly less stringent assump- 
tions however. We allowed valid international transitions to have a 
maximum implicit speed of 1000, rather than 150 km/h. Addition- 
ally, we considered as validly identified through geo-location those 
users for whom the cumulative duration of valid spells exceeded a 
threshold of 90 days, rather than 300. 
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Figure 1: Email communication Log-densities, before and after 
rescaling. 

simplest way to justify this claim employs Dunbar's number |17| , 
the empirically- verified limit - often quoted as 150 alters - of so- 
cial ties a person can maintain at one point in time. Were a growing 
graph to maintain constant density, the mean degree would have to 
grow linearly with the number of nodes, eventually overshooting 
Dunbar's number. 

This observation is verified in practice by the graph presented 
in Figure [JJ The graph plots along logarithmic axes the tie den- 
sity tj j between a pair of countries (i, j) graphed and the maximum 
number of ties between the countries. The graph shows a linearly 
decreasing bound of maximum density, obtained for the case when 
we are considering ties within the same country. Rescaling the tie 
counts requires attention not only to the absolute vertex count of the 
observed graph (2V;), but also to the share of a country's population 
contained in the graph (c,). Web-based services such as Yahoo! do 
not grow randomly; rather, the graph expands through the social 
networks of current users. This process resembles multi-seeded 
snowball sampling, starting from a few unrelated individuals (the 
early adopters), and expanding to an increasing number of the cur- 
rent users' social contacts who decide to join the network. When 
in one country only a small proportion of users is included, it is 
quite plausible that idiosyncratic variances in likelihood of having 
social connections with certain other countries would be correlated 
between individuals in the sample, who are likely to be clustered to- 
gether. Thus, the positive "signal" observed at low levels of market 
penetration with respect to between-country connections is likely 
to be overstated in relation to the total possible tie-count, compared 
to what one would observe at higher levels of market penetration. 

We use a log-linear regression model to predict the expected de- 
crease in the density tj j, as a function of the size of each country's 
userbase, counted both in absolute terms and as a proportion of 
each country's adult population. The regression results support our 
assumptions, as the predicted density decreases with the number 
of users and their share in the population (Table [TJ. As Figure [TJ 
shows, the data is organized in two clusters, one of within-country 
densities, the other of between-country densities. We estimate the 
model jointly for the two clusters, but allow for within-cluster vari- 



Table 1: Ordinary Least Squares Regression. Response: Ln 

Between-Country Ties / Total Possible Ties 

Independent Variable Coefficient (S.E.) T-value 



Intercept 


-10.97* 


(.26) 


-42 


.12 


Users in Country #1 ' 


-0.36* 


(-01) 


-36 


.57 


Users in Country #2 < 


-0.35* 


(.01) 


-34 


.48 


Users i /Pop] 


-0.13* 


(.01) 


-10 


.21 


Users2/PopJ, 


-0.15* 


(.01) 


-12 


.03 


Mean Degree of Country #1 


-1.68* 


(.16) 


-10 


.91 


Mean Degree of Country #2 


-1.41* 


(.18) 


-7. 


.97 


Mean Degree of Country #1 x #2 


1.45* 


(.17) 


-8 


.02 


Same country 


10.52* 


(1-04) 


10 


.15 


■ ■ ■ x Users ' 


-0.17* 


(.07) 


-2. 


.62 


■ ■ ■ x Users / Population ' 


0.41* 


(.08) 


4 


.20 


■ ■ ■ x Mean Degree 


1.34* 


(.40) 


3 


.08 



Source: t - Transformed by taking natural 
9,144 . *: p < .001, +: p<.01. Two-tailed 



logarithm. Sample size: 
tests. Adjusted/? 2 : .58. 



ations in the effects of each predictor, through the use of interaction 
effects. 

The R 2 coefficient of .58 shows that this very simple regression 
model explains nearly 60 percent of the variance in changes in the 
observed density, with international dynamics expect to account for 
the remainder of the variance. The model allows an important ad- 
justment: We can derive the expected density t- , by refitting the 
model using the country's population as the user base (and thus as- 
suming a hypothetical sample of 100% of a country's adults). The 
estimate t[ ■, represents the model's best guess as to what the den- 
sity of ties between a particular pair of countries should be, in the 
hypothetical scenario of a network census being available for both 
countries. Accordingly, 



hJ Tmax' 

u 



Tf. 



:c iC j-T>j-(T^)- 1 (1) 



To rescale the ties we can then divide t[ j through the original den- 
sity tjj = 7]j/7]™ ax . The resulting fraction gives: 



1 hJ\ 1 iJ > 1 ',J 

From this relation we can extract the following formula for T-y. 



In other words, it is ipw possible to correct the rescaling by mul- 
tiplying by the ratio f 1 , which quantifies how much the density 

would decrease if the network of all individuals in the two coun- 
tries were observed. We derive this ratio by dividing the predicted 
values of both t-j and f/jj^] 

4. INTERNATIONAL STRUCTURE IN SO- 
CIAL NETWORKS 



3 We divide through the predicted, and not the observed, density 
under the current sampling conditions so as not to impose any fur- 
ther assumptions than are necessary on the data. Dividing through 
the observed density would have imposed a strictly linear relation 
between the population of countries and the tie count, thus elim- 
inating precisely the variance that will make the object of further 
study in the paper. 



The pairwise densities estimated through the rescaling procedure 
can be represented as a weighted network of cross-country connec- 
tions. Figure |2ja) represents the top 100 largest between country 
ties, in terms of their absolute size. As expected, most of the ties 
occur between countries with large populations. Though useful in 
identifying the highest magnitude cross-country ties, this represen- 
tation communicates little about the deeper structure of the world's 
social networks. A more useful picture can be obtained by inspect- 
ing the top 100 ties between countries, judged by their rescaled den- 
sities (/ ( - j), defined as the ratio between the rescaled raw tie counts 
and the total number of possible ties (Equation [T}. Given that for 
even the smallest countries such values are going to be extremely 
low, all of our calculations are carried out in log-space to prevent 
numerical underflow, and to drastically improve model fit. 

We obtain a graph of 141 countries and 7,246 ties out of 9,870 
possible]^] The tie weights are derived from the rescaled logarithm 
of the communication densities. Because the logarithms are neg- 
ative, the minimum observed log-density (tL_) is subtracted from 
each cell of the adjacency matrix: 



- ln t': : - ln t'. 



The resulting edge weights w; ; thus indicate how many times over 
(in terms of powers of the number e) a certain between-countr 
rescaled density t- j exceeds the minimum rescaled density t' mi 
The graph thus indicates a logarithmic measure of affinity between 1 
countries, indicating orders of magnitude, rather than absolute counts. 

Figure[3]plots the top 1,000 ties observed in the above-described 
graph, laid out according to the Fruchterman-Reingold algorithm 
|22| 1 1 6| . and nodes are colored according to their presumed civi- 
lizational membership. Upon visual inspection the graph provides 
evidence for Huntington's theory. The graph shows clear clusters 
according to civilizations. The Latin American cluster is most strik- 
ing, set off from the rest of countries in one region of the graph, 
with Spain and Portugal - the former colonial metropolises - acting 
as intermediaries between this civilization and the Western civiliza- 
tion, which likewise occupies its own clear region of the graph, with 
the exception of the Philippines and Papua New Guinea, two coun- 
tries which can be judged as marginal to the Western block. The 
Orthodox civilization (ochre) is contiguous with the Western (blue) 
region of the graph, with Greece and Kazakhstan in-between the 
Orthodox cluster and the Western and the Islamic regions, respec- 
tively. The Islamic civilization appears less coherent, with Central 
Asian, Middle Eastern and North African countries in separate re- 
gions, though with some level of contiguity. Sub-Saharan African 
countries appear torn between two tendencies - to connect within 
their civilization, or to connect outside the civilization, to Western 
former colonial powers, or to Middle Eastern countries, with which 
some Sub-Saharan African countries share religious affinities. 

The visual representation of the graph shows clear hints of a cor- 
relation between the labeling of countries according to civilization 
and the obtained structure of the world's communication network. 
Indeed, the adjacency matrix obtained by creating a graph of co- 
civilizational memberships has a product-moment correlation co- 
efficient of .397 with the adjacency matrix of the rescaled com- 



4 We imposed a threshold for each count: country pairs with too 
few connections were recorded as having none 
5 The natural logarithm of the minimum rescaled density is -29.36, 
corresponding to one expected cross-border tie between individuals 
in two countries for every 5.6 trillion possible. A rescaled density 
of -18 corresponds to one tie for every 66 million. By subtracting 
the minimum value of -29.36 from -18 we get instead a value of 
11.36, indicating that the observed count is 86,000 times greater 
than the minimum possible count. 



munication networkjj This result's statistical significance is bol- 
stered by a test using the Quadratic Assignment Procedure (QAP) 
|30| . Given a certain graph structure (i.e., the communication net- 
work) and a certain set of vertex labels (i.e., civilizational mem- 
bership), QAP computes permutations, thus generating alternative, 
random partitions of the world's countries. No such permutation 
approaches the obtained correlation coefficient: out of 10,000 ran- 
dom civilizational assignments, the highest obtained correlation co- 
efficient was .059, less than a sixth of the correlation obtained using 
Huntington's labeling^] 

Another observation related to Figure[3]concerns the central po- 
sition of the Western civilization compared to the others. To test 
whether Western countries are truly central to the derived commu- 
nication graph we compute three measures of centrality, reported 
in Table [4] Degree centrality indicates the total weighted degree of 
each country.This measure translates to deg ( - = lnQTj^f- j/tma), 
where f m j n is the minimum observed density. 

Western countries have the highest mean degree centrality ( 1 302.4), 
followed by Sinic (1076.8) and Islamic (1029.6) countries, while 
the lowest values are recorded for Latin American (904.4) and African 
(806.6) countries. Eigenvector centrality [ 12 1 indicates the extent 
to which a country has large rescaled communication densities with 
other countries that have similarly large densities. Western coun- 
tries are again at the top of this ranking, with a mean centrality score 
of .101, followed by Orthodox, Sinic and Islamic countries, with 
scores of .084, .083 and .081, respectively, while African countries 
have the lowest score (.064). Betweenness centrality [21 1 repre- 
sents an alternative conceptualization of position in the network, 
expressing the extent to which a country lies on the (weighted) 
shortest paths between other countries in the graph. Sinic countries 
have the highest betweenness score (60.79), followed by Hindu 
(58.46) and Western (58.06) countries, whereas the lowest scores 
register with Orthodox (27.79) and Latin American countries (17.25). 

Seen from the perspective of network analysis, Huntington's ef- 
fort may be conceived as one at partitioning the world's graph of 
between-country affinities into a series of communities. A simple 
question to ask is whether one could do "better" than Huntington at 
partitioning the world's countries into civilizational blocks, based 
on the structure observed in the communication network. We com- 
pared Huntington's assignments to those made by three commu- 
nity detection algorithms for weighted undirected graphs: the Sp- 
inglass algorithm [40], the Walktrap algorithm [38 1, as well as the 
greedy algorithm proposed by Clauset, Newman and Moore 1 15 
Cross-tabulated community assignments are shown in Table|2] The 
African, Latin American, Orthodox, Hindu and Sinic civilizations 
appear to be particularly consistent, all countries in each one of the 
five civilizations being assigned to the same community by two of 
the algorithms. An examination of the Rand index |27| reveals that 
the best agreement occurs between Huntington's assignments and 
the Walktrap algorithm, the two graph partitions being in agreement 
about 42.7% of all pairs of countries being in the same community 
or not. By comparison, the cross-tabulation of Walktrap and Spin- 
glass had a Rand index of 39.7%, the same value being 28% for a 
cross-tabulation of Walktrap and Greedy. 



Correlation calculated using the gcor function in the SNA R pack- 
age |13| |30| . Adjacency matrix formed by natural logarithm of 
rescaled communication densities first normalized by subtracting 
lowest density obtained. 

7 Performed using qaptest procedure in SNA R pa ckag e |l3||30) . 
8 We ran this analysis using the igraph R package |16) . 



Table 2: Community Detection Results across Civilizations 



Civ. 


Spinglass 


Walktrap 


Greedy 


Cross-tab 


1 


2 3 


4 


1 2 


3 4 5 6 7 8 


1 2 


3 


African 


28 








17 


2 2 2 5 


28 





Buddhist 





1 5 





1 


3 2 


1 5 





Hindu 





2 








2 


2 





Islamic 


7 


8 17 





8 


2 17 5 


8 24 





Lat. Am. 








19 





19 





19 


Orthodox 





12 





8 


4 


12 





Sinic 





4 








4 


3 1 





Western 


4 


17 4 


8 


00 24 0504 


20 8 


5 


Rand Index 


0.371 


0.427 


0.271 


X 2 stat. 




239.84 






352.14 


172.8 


3 


dF 




21 






49 


14 





Table 3: Mean Weighted Centrality Scores, by Civilization 



Civilization 


Centrality 


Degree Betweenness Ei 


genvector 


African 


806.6 


32.06 


0.064 


Buddhist 


949.2 


31.32 


0.076 


Hindu 


914.8 


58.46 


0.073 


Islamic 


1029.6 


33.83 


0.081 


Latin American 


904.4 


17.25 


0.072 


Orthodox 


1052.9 


27.79 


0.084 


Sinic 


1075.8 


60.79 


0.083 


Western 


1302.34 


58.06 


0.101 



Source: Yahoo! email dataset. Rescaled densities. Statistics based 
on adjacency matrix of natural logarithm of rescaled communication 
densities, transformed by subtracting the minimum observed value. 
Values calculated using SNA package in R |13|. 



5. DETERMINANTS OF INTER-COUNTRY 
CONNECTIONS 

While it appears that Huntington's assignment of countries to 
civilizations is not unlike that of a community detection algorithm, 
the question of spuriousnes must be considered. Could it be, for in- 
stance, that Latin American countries are so strongly connected by 
mere accident of geographic proximity? Or could flights, colonial- 
ism, or perhaps trade flows account for the effect we witness? To 
try and distinguish between multiple factors influencing between- 
country communication we used a Linear Mixed-Effects regression 
to model the magnitude of edges of a nearly-complet^] weighted 
graph of the log transformed pairwise communication density be- 
tween the 50 countries for which complete data were available for 
all variables of interest. By including random effects for each coun- 
try, the model allows us to control for tendencies to attract more 
social ties due to unobserved, country-specific factors. 

Cultural Factors Using this network of cross-country affinities, 
we investigate whether a battery of cultural measures covary with 
inter-country flows of interpersonal email communication. Follow- 
ing Huntington, we code countries based on location in one of the 
eight civilizations he demarcates, as coded by Russett, Oneal and 
Cox |41||37| . A shared language should likewise increase the two 
countries' level of reciprocal affinity. In the very least, shared lan- 
guage enables communication, a logical pre-requisite for the cre- 
ation of new ties between the inhabitants of two countries [36 1. We 
use data regarding between-country former colonial relationships 
as recorded by Neumayer [36], following his distinction between 
Commonwealth and non-Commonwealth countries. 



'There were 1,221 observed counts out of a total of 1,250 possible. 




(a) Rescaled Counts (b) Normalized by Population Size 

Figure 2: Top 100 rescaled counts, raw and normalized. 



Additionally, we include four of Hofstede's |25| cultural mea- 
sures: Power-Distance (PDI), Individualism (IDV), Masculinity (MAS) 
and Uncertainty Avoidance (UAI)[^To these four, we add a fifth 
cultural dimension, the generalized trust index, derived by Bjorn- 
skov from a meta-analysis of available studies |10| . 

Economic Factors Our analysis is focused on two sets of eco- 
nomic predictors of between-country communication: development 
and trade. We measure economic development as the average 201 1 
GDP of each country pair, collected from the World Bank [T]. 
To account for the prediction of World Systems theory regarding 
the existence of a hierarchical structure in international relations 
we likewise included the absolute difference between each pair of 
countries GDP. Additionally, we include in our analysis a measure 
of bilateral trade derived from the Correlates of War Dyadic Trade 
dataset |8, 7|. We defined a dyadic trade flow as the 2011 US dol- 
lar value of goods exchanged between two countries. A country's 
total trade was defined as the sum of a country's total imports and 
exports. We define the trade affinity between two countries as the 
natural logarithm of the ratio between the dyadic trade flow and 
the geometric mean between the two countries' total trade values. 
Because 49% of all possible pairs of countries in our 50-country 
dataset had no recorded trade in the Correlates of War dataset, we 
used mean imputation to address the missingness issue in the trade 
affinity variableP'J 



The other two Hofstede measures, Long-Term Orientation (LTO) 
and Indulgence vs. Restraint (IVR) were not included in our anal- 
ysis because of insufficient coverage of the former and potential 
issues methodological issues created by the latter measure's com- 
parative novelty. 

11 The mean imputation strategy assumes that pairs of countries for 
which no trade data is observed have trade affinities equal to the 
mean affinity recorded. Mean imputation was found to improve 
model fit compared to an alternative strategy of min-imputation, in 
which unobserved trade affinities were assumed to be equal to the 
minimum trade affinity recorded. Regardless of imputation strat- 
egy, the effect due to common civilization was found to persist 
across the models we estimated. 



Controls Additionally, our analysis includes several controls for 
factors that may systematically influence the between-country den- 
sity of ties. We count here measures related to countries' popu- 
lations, geographic factors related to location, and administrative 
factors which may impact tie formation and maintenance. 

Our response variable is the between-country social affinity, de- 
fined as the ratio between the rescaled inter-country tie count, and 
the maximum number of possible inter-country ties. This latter 
quantity, defined as the product of two countries' populations is 
possible only in theory, however. As individuals can maintain only 
a limited number of social ties 1 17 , 24 1, it becomes impossible for 
two countries to approach this theoretical maximum. This state- 
ment is particularly important for large countries, between which 
densities are likely to be particularly small as a consequence of the 
countries' large populations. Thus, it is imperative for our model 
to include a control for the countries' populations. Our measure 
of population uses the natural logarithm of the geometric mean of 
each pair of countries' populations, as derived from World Bank 
2011 datajT]. To allow for effects due to the (potentially) pecu- 
liar nature of densities between countries of unequal populations 
(e.g. U.S. and Barbados) we also include the log-transformed ratio 
between the larger and the smaller country populations. 

We also control for distance, given that the density of social ties 
has been shown to decay exponentially with distance, a finding 
by Milgram 1 49 1 which has been replicated many times using of- 
fline j5T) and online data| 32 , 47 , 33 ] . Thus, the farther apart the two 
countries the fewer the expected ties between them, ceteris paribus. 
We use the log-transformed distance between each two countries' 
centroids, as derived by Neumayer |36|. Seeking to account for 
unevenness in the distribution of country sizes across the world, 
our analysis also includes a measure, collected by the Correlates 
of War project, of whether or not two states's territories are con- 
tiguous, either through their mainlands, or through their colonial 
dependencies |46|. Another factor we consider is air travel, which 
Takhteyev et al. (2012) found to be very strongly correlated with 
the geographic structure of the Twitter online social network|47 |. 
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Figure 3: The Mesh of Civilizations 

Source: Yahoo! email dataset. Rescaled densities. Only top 1,000 densities displayed. Colors indicate Huntingtonian civilization, as collected 
by |41 1 and provided by (36). Layout using weighted Fruchterman-Reingold algorithm [22], as implemented in igraph R package 1 16| . 
Layout based on full graph of rescaled communication densities, using monotonic transformation f(x) = [(x — min(x))/range(x)] 4 , where x 
is the natural logarithm of the communication density. Only countries with more than lm inhabitants as per 1 1 1 included. Observations on 
Somalia, Myanmar and the Palestinian Territories excluded. 



Table 4: Linear mixed-effects model. Response: Log-Density 



FIXED EFFECTS 



Indep. Var. 


Coef. 


(S.E.) 


T-value 


1-Var. Coef. 


Intercept 


10.027*** 


(1.634) 


-6.138 




Economic Factors 


Mean GDP ($ 1000s) 


0.015* 


(.008) 


1.901 


0.053*** 


Dif. GDP ($ 1000s) 


0.015**" 


: (.002) 


8.000 


0.006*** 


Trade Affinity 


0.084**" 


(.021) 


4.042 


0.354*** 


Cultural Factors 


Common Civilization 


0.663**" 


(0.089) 


7.441 


1.340*** 


PDI Mean 


0.001 


(0.006) 


0.122 


-0.045*** 


PDI Diff. 


0.001 


(0.002) 


0.518 


0.004*** 


IDV Mean 


0.008 


(0.007) 


1.125 


0.042*** 


IDV Diff. 


0.014**" 


(0.002) 


8.337 


-0.002 


MAS Mean 


-0.003 


(0.006) 


-0.406 


-0.006 


MAS Diff. 


-0.004+ 


(0.002) 


-1.911 


-0.002 


UAI Mean 


-0.010** 


(0.004) 


-2.319 


-0.006*** 


UAI Diff. 


-0.010**" 


: (0.001) 


-7.041 


-0.002*** 


Gen. Trust Mean 


-0.020** 


(0.008) 


-2.230 


0.038*** 


Gen. Trust Diff. 


0.003 


(0.002) 


1.253 


-0.002 


Common Language 


0.976**" 


(.101) 


9.585 


2.468*** 


Colonial Link 


1.281**" 


: (.208) 


6.162 


1.811*** 


Commonwealth Link 


0.214 


(.145) 


1.475 


1.540*** 


Controls 


Population Avg. 1 


-0.433**" 


(.093) 


-4.644 


-0.604*** 


Population Dif.* 


-0.024 


(.025) 


-0.959 


-0.049 


Ln(Distance) 


-0.749**" 


(.060) 


-12.379 


-1.085*** 


Same Region 


0.198+ 


(.109) 


1.808 


1.849*** 


Contiguous Border 


-0.253** 


(.109) 


-2.323 


1.721*** 


Visa Required 


-0.127* 


(.064) 


-1.985 


-0.630*** 


Ln(Direct Flights + 1) 


0.196**" 


(.035) 


5.557 


0.735*** 


Both in E.E.A. 


-0.390**" 


: (.118) 


-3.294 


1.287*** 


RANDOM EFFECTS 



Variance Std. dev. 

Country 1 0.182 0.427 

Country 2 0.147 0.384 

Residual 0512 (X715 

t In \/Pop^Pop^, $lnmin(Pop n .Pop i) )/max(Pop (I ,Pop () ). Sample size: 
1,221 relations (50 countries). Scaled deviance: 2,816. Log-likelihood: 
-1,491. AIC: 3,041. * p < .10, ** p<.05, *** p<.01. Two-tailed tests. 
McFadden R 2 : .292. One-variable model estimated using the same data as 
the main model, with the same random effects and a sole fixed effect for 
the variable of interest. 

To account for the effect of air travel we use the natural log of the 
cumulative number of direct airline flights between each pair of 
countries, as recorded in the OpenFlights database|2). 

Using data collected by Neumayer |36], we also measured po- 
tential administrative barriers to the creation of cross-country ties. 
For instance, if visa regimes make it difficult for the residents of 
one country to travel to another country, then one would expect 
fewer ties to exist between the two countries. Given the importance 
of European integration, we also coded countries for membership 
in the European Economic Area. 

Results We present our estimates in Table 4. The results pro- 
vide support for economic as well as cultural explanations. All else 
being equal, wealthier countries are more likely to communicate 
with one another. Tie density increases by 1.5% for each addi- 
tional thousand dollars increase in a pair of countries' mean 201 1 
GDP per capita. As expected, inequality between two countries 
GDPs translates into higher communication densities, to the tune 
of a 1.5% increase for each additional thousand dollars separating 



Table 5: Civilizations in LMER Model (Selected Coefs.) 



FIXED EFFECTS 



Indep. Var. 


Coef. 


(S.E.) 


T-value 


Sinic 


0.689 


0.427 


1.613 


Islamic 


1.133*** 


0.176 


6.428 


Latin American 


1.694*** 


0.177 


9.561 


Western 


-0.155 


0.142 


-1.094 


Orthodox 


0.878** 


0.425 


2.069 


African 


-0.647 


0.444 


-1.456 


Buddhist 


1.191 


0.753 


1.581 


Model contains same 


covariates as the 


main model in 


Table |5| with the 



exception of Common Civilization. Sample size: 1,221 relations (50 coun- 
tries). Scaled deviance: 2,741. Log-likelihood: -1,454. AIC: 2,977. * p < 
.10, ** p<.05, *** p<.01. Two-tailed tests. McFadden R 2 : .310. 

the two countries GDPs. For every doubling of the trade flows be- 
tween two countries the model reports an increase by a factor of 
1.13(e' 084 / ln2 ) in the rescaled logged communication density. 

Cultural factors also impact between-country social affinities. To 
test for cultural correlates of international alignment, we included 
four of Hofstede's [25 J cultural dimensions: Power-Distance (PDI), 
Individualism (IDV), Masculinity (MAS) and Uncertainty (UAI). 
In addition, we included a measure indicating common member- 
ship in one of the above-mentioned civilizational blocks as a direct 
test of the "clash of civilizations" hypothesis. Common member- 
ship in the same Huntingtonian civilization nearly doubles the ex- 
pected pairwise density, increasing it by a factor of 1.941(e 663 ). 
The effects of the Hofstede measures also confirm the expected cul- 
tural homophily based on Masculinity and Uncertainty Avoidance, 
but not for PDI. Each additional point difference (for variables mea- 
sured on 100-point scales) yields a decrease in the rescaled tie den- 
sity by 0.4% for MAS and 1% for UAI, while communication den- 
sity decreases by 1 % for each additional point increase in the mean 
UAI value of a pair of countries. Surprisingly, cultural similarity 
on the IDV dimension reduces pairwise density, the opposite of 
what we expected. For each point increase in the pairwise IDV dif- 
ference, we observe a 1.3% increase in pairwise density for IDV. 
A shared official language has the expected strong effect, increas- 
ing pairwise tie densities by a factor of 2.70. Additionally, Non- 
Commonwealth colonial relations increase communication density 
by a factor of 3.6 (e 1,281 ), while the effect of Commonwealth rela- 
tions is not found to be statistically different from 0. 

With one exceptions, all control variables have significant effects 
on the dependent variable. The expected tie density decreases by 
46% for each doubling of the population mean. As expected, tie 
densities decrease drastically with distance, with a 66% drop for 
each doubling of distance. Curiously, countries with contiguous 
borders have lower expected densities (by 22%), ceteris paribus. 
Another counter-intuitive result concerns joint membership in the 
European Economic Area, which is found to reduce density, by 
32%, compared to what the model would predict otherwise. Visa 
regimes are predicted to reduce tie density by 12% for country pairs 
with unilateral or bilateral travel visa restrictions. As expected, 
more direct flights result in an increase of the tie density, which is 
predicted to increase by 33% for every doubling of the number of 
direct flights between a country. 

A potential issue with the results concerns the effect of model 
specification on the estimates. To obtain a qualitative assessment of 
how much our particular choice of covariates impact our findings, 
we estimated separate models independently for each independent 
variable, using the same linear mixed-effects specification and the 
same dataset as in the main model, but only the variable of interest 
as a fixed effect. With a few exceptions, our model's findings do 



not deviate qualitatively from the main model's estimates. While 
the coefficients of economic factors are robust to this comparison of 
model specification, there is a great deal of disagreement between 
the one- variable models and the full models with respect to cultural 
factors. The only three cultural variables where the sign, statistical 
significance and order of magnitude of estimates are preserved are 
Common Civilization, and the UAI Mean and Difference. Most 
controls are likewise robust to this comparison, with the exception 
of contiguous borders and common EEA membership, the two fac- 
tors that yielded unexpected findings. Here as in the case of the 
non-robust Hofstede measures (all but UAI) we note the existence 
of interesting patterns, but we caution the reader against a decided 
interpretation of the main model estimates, as their signs and mag- 
nitudes appear to be sensitive to specification. 

Our analysis reveals the existence of a large, positive statistically- 
significant effect of common civilizational membership on between- 
country communication density. This result provides evidence to- 
wards a division of the world into civilizational blocks following 
Huntington's theory. As Table[5]reveals, not all civilizational blocks 
are equally consistent, however. The table shows selected estimates 
from a model having the same specification as the main mixed- 
effects model presented in Table 3, but that separates the "common 
civilization" variable according to each civilization^] Three civi- 
lizations - Latin American, Islamic and Orthodox - have strong and 
significant effects when considered separately from one another. 
Indeed, for these civilizations the predicted effect on tie density is 
even higher than the overall effect shown in the main model. When 
compared against what the model would predict given two coun- 
tries' values in all the other covariates, tie density is expected to in- 
crease by a factor of 2.4 for Orthodox countries, 3.1-fold for Latin 
American countries, and by a whopping factor of 5.44 for Latin 
American countries. Effects are positive but not significant for the 
Sinic and Buddhist civilization, possibly due to their containing 
few countries. Similarly insignificant are effects for the Western 
and African civilizations, though their coefficients are negative. 

6. DISCUSSION 

Not all civilizations "survive" a regression analysis that controls 
for the numerous economic and political factors that may impact 
cross-country communication. The strong effects we see associated 
with Islamic, Latin American and Orthodox countries demand fur- 
ther explanation however. For one reason or another, the countries 
in these groups have stronger level of association than the model 
would predict. In this respect we cautiously assign a level of valid- 
ity to Huntington's contentions, with a few caveats. The first issue 
was already mentioned - overlap between civilizations and other 
factors contributing to countries' level of association. Huntington's 
thesis is clearly reflected in the graph presented in Figure [3] but 
some of these civilizational clusters are found to be explained by 
other factors in Table [5] The second limitation concerns the fact 
that we investigated a communication network. There is no nec- 
essary "clash" between countries that do not communicate, and 
Huntington's thesis was concerned primarily with ethnic conflict. 
Indeed, the validity of Huntington's ideas with respect to ethnic 
conflict has come into controversy! 20 1, an d we limit ourselves to 
showing the validity - at least partial - of this division for commu- 
nication networks. 

The third limitation is given by the data. The task of converting 
a worldwide communication network with uneven coverage into a 
set of comparable communication densities is not trivial. We hope 

l2 Due to insufficient country-pair observations, the model could not 
be fit with a dummy variable for Hindu common civilization. 



our work on this subject, presented in Section [3~2| will make a con- 
tribution to addressing this problem. We are also confident that 
the future growth of Computational Social Science will bring forth 
novel techniques for improving the estimation of such communica- 
tion densities, perhaps through the incorporation of richer sets of 
features into the rescaling. Our experience also suggests that future 
studies of global online networks would benefit from an explicit 
consideration of the influence of market share and Internet penetra- 
tion, and from the development of methods to account for potential 
biases due to these factors in network statistics. 

Our analysis of the determinants of between-country communi- 
cation likewise afforded an important opportunity to test a number 
of theories at the global level. The findings (unsurprisingly) support 
the idea that geography, transporation and administrative decisions 
are all important determinant of between-country communication: 
distance decreases density, as do visas, while direct flights increase 
it. Our findings in the main model with respect to contiguous bor- 
ders and common European Economic Area membership appear 
surprising, as they decrease rather than increase density, once the 
other variables in the model are controlled for. These curious find- 
ings do raise the issue of potential problems with European integra- 
tion, as well as of the higher potential for conflict between countries 
sharing borders, which may lead to less communication. We ad- 
vance these explanations only tentatively however, as the direction 
of these coefficients appears dependent on the model specification. 

When it comes to cultural factors, it is not just Huntington's civ- 
ilizations that matter. We also found important effects associated 
with common language, previous colonial relationships, as well as 
with Hofstede's uncertainty avoidance (UAI) measure. This latter 
finding suggests that countries with more uncertainty aversion are 
less likely to be connected - perhaps because maintaining interna- 
tional connections requires a certain degree of risk-taking. Like- 
wise less likely to be connected by social ties are countries that 
differ on this dimension, perhaps a reflection of the influence of un- 
derlying differences in social norms measured by this variable. The 
finding that countries that differ in the Individualism (IDV) mea- 
sure are more likely to connect appears dependent on model spec- 
ification, as is the result which suggests that countries with higher 
generalized trust are likely to have lower communication densities. 
We consider these findings interesting puzzles, but for which the 
advancement of an explanation is premature, given the effects' in- 
stability. As expected, we find economics to have an important role 
in shaping international social relations. Living in countries with 
higher GDP makes establishing and maintaining international con- 
nections easier, and countries with higher trade flows are also likely 
to have greater flows of people between them, and thus higher com- 
munication densities. We also observe an effect associated with 
hierarchy, as predicted by World Systems Theory: countries with 
dissimilar GDPs are more connected, the effect of such inequality 
increasing once controls are included. 



7. CONCLUSION 

The reality of globalization has become a commonplace of lay 
and scientific discourse alike. The promise of Computational So- 
cial Science is to help scholars go beyond such observations, en- 
abling careful measurement of the world's social structures. Newly- 
available large, global datasets offer the possibility of an account of 
international relations as observed between nationals rather than 
among nations. Our study illustrated how such an opportunity 
could be pursued with one particular dataset. It is even more excit- 
ing to consider the possibility of combining insights derived from 
multiple online datasets to produce a clearer picture of the world's 



social networks. We hope our study has shown the promise of the 
Internet in the study of our global mesh of civilizations. 
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